Exploring Exercise and Heart Rate Dynamics: A Comprehensive Analysis¶

In this analysis, we aim to address several key questions using data visualization techniques.

Visualizing Data Distribution We will examine the distribution of heart rate data during different activities/exercises.

Effects of Exercise on Heart Rate We will investigate how heart rate changes after a month of regular exercise. Is there a noticeable difference?

Correlation Between Power and Heart Rate: We will explore the relationship between power output and heart rate to see if they are linked.

Heart Rate Variations Throughout Exercise We'll analyze how heart rate fluctuates during exercise over time.

Heart Rate and Distance We will examine how heart rate is affected by the distance covered in various types of exercises.

Heart Rate , Power and cadence relationship We will analyse the relationship between heart rate, power and cadence.

Geographical Information Systems Map - We will visualize the GIS map to track the route taken for diffent activity.

Assumption¶

For the purpose of this analysis, I have made the following assumptions:

Age Group The heart rate analysis focuses on individuals aged 40-45.

Running Speed The average speed for running is estimated to be around 12 km/h any speed above this is considered bicycling.

Power Measurement Power is defined as a measurement of the work done on the bike. This means that both the effort applied to the pedals and the cadence (the speed at which you are pedaling) contribute to the overall power output.

Reference¶

  • Heart rate - hearth.org
  • Running speed - runnersworld.com
  • Power - Trainerroad
In [1]:
# import the required library
import warnings
import numpy as np
import pandas as pd
import statsmodels.api as sm
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from mpl_toolkits.mplot3d import Axes3D
In [2]:
# as we are analysing data based on single file add as an constant
FILE_PATH = 'assets/strava.csv'

#supress any pandas warning for inplace update
warnings.filterwarnings("ignore")

def load_and_pre_process_data(file_path):
    
    # load the data in pandas dataframe
    df = pd.read_csv(file_path)

    # create the buckets to find the differnt exercise
    buckets = [ i for i in range(100 , 2, -1 )]

    # column name required for analysis
    columns = ['timestamp', 'distance', 'speed', 'heart_rate', 'Power', 'position_lat' , 'position_long', 'enhanced_altitude', 'cadence']

    # filter the required columns 
    res  = df[columns]

    # parse the colums are required datatype
    res['timestamp'] = pd.to_datetime(res['timestamp'])
    res['date'] = res['timestamp'].dt.date
    res['date'] = pd.to_datetime(res['date'])

    # substract the next/prev value form current to find the start/end time of exercise
    res['next'] = res['distance'].diff(periods=-1)
    res['prev'] = res['distance'].diff()

    # assign a bucket to each exercise
    res['bucket'] = res['prev'].apply(lambda x : buckets.pop()  if x < 0 else np.nan)

    # assign 1 to first bucket
    res['bucket'].iloc[0] = 1

    #forward will the bucker number to identify a exercise duration 
    res['bucket'] = res['bucket'].ffill()

    return res


### group the data to find the activity type and other analyis 

def process_and_agg_data(raw_data):

    # get the group for different bucket
    groups  = raw_data.groupby(['bucket'] , as_index=False)

    # Distance traveled during each activity and scale up to KM.
    data = groups[['distance']].agg(lambda x :  0.001*(x.iloc[-1] - x.iloc[0]))

    ## Activity duration and convert it to hour
    data['duration'] = groups[['timestamp']].agg(lambda x :  (x.iloc[-1] - x.iloc[0]).total_seconds()/3600)['timestamp']

    ## calculate average speed for the activity that wwe will use find activity.
    data['avg_speed'] = data['distance'] / data['duration']

    # start time of activity
    data['start_time'] = groups[['timestamp']].first()['timestamp']

    #minimum heart rate during activity 
    data['min_hr'] = groups[['heart_rate']].min()['heart_rate']

    #maximum heart rate during activity
    data['max_hr'] = groups[['heart_rate']].max()['heart_rate']

    return data


# update the data with activity type
def data_updated_activity(file_path):

    raw_data = load_and_pre_process_data(file_path)
    
    data = process_and_agg_data(raw_data)

    # bicycle activity
    bicycle_buckets = data[data['avg_speed']>12]['bucket'].to_numpy()

    # update activity based on assumption that average running speed is not more then 12 KM/Hour
    raw_data['activity_type'] = raw_data['bucket'].apply(lambda x : 'Bicycle' if x in bicycle_buckets else 'Running' )

    return raw_data

Figure 1 - Visualizing Data Distribution for Heart Rate¶

Violin plots

Distribution insight - Violin plot provides a full view of the data distribution. This includes the probability density of heart rates at different levels. It also offers insight into how heart rates are spread out across a range of values.

Multiple Modes - If the heart rate data has multiple peaks (modes), a violin plot will show these effectively.

Aesthetic Appeal - Violin plots are visually appealing and can be more engaging to an audience, making it easier to convey our findings.

Clear Indicators of Density - The width of the violin at different heart rate levels represents the density of observations at that level, making it intuitive to see how common certain heart rate levels are.

In [21]:
# Load data for violin plot
data = load_and_pre_process_data(FILE_PATH)

fig = plt.subplots(figsize = (9,7))
sns.violinplot(data=data , y='heart_rate').set(title = 'Distribution of Heart Rate' , ylabel = 'Heart Rate' )
plt.legend(['Min  - ' + str(data['heart_rate'].min()),
            'Max - ' + str(data['heart_rate'].max()),
            'Mean - ' + str(int(data['heart_rate'].mean()))]
           , fontsize = 'x-large')
plt.show()

Analysis of Violin Plot -¶

The violin plot above clearly illustrates the distribution of heart rate data. The analysis reveals that the heart rate density peaks around 120, 140, and 160, with the highest density occurring at approximately 140. This observation is further supported by the mean heart rate of 134 bpm.

Figure 2 - Correlation Between Power and Heart Rate¶

Scatter plot

Identifying Patterns - Scatter plots display individual data points on two axes (one for heart rate and the other for power output). This allows us to visually assess the relationship between the two variables. We can quickly identify trends, clusters, or patterns, helping us to see how heart rate changes with varying power outputs.

Outlier Analysis - Scatter plots helps identify outliers data points that fall outside the general pattern of results.

Variation Insights - The spread of points in the scatter plot indicates variability in heart rate responses at the same power output. This will be helpful for undestanding differences in fitness levels, conditions.

In [19]:
# load the data again to avoid any data collision if cells are not run in order
power_heart_rate  = load_and_pre_process_data(FILE_PATH)[['heart_rate', 'Power']].dropna().reset_index()

#add a new scatter plot 
fig = plt.subplots(figsize = (10,5))
sns.scatterplot(data = power_heart_rate , x = 'Power' , y = 'heart_rate').set(title = 'Scatterplot of heart rate vs power' , ylabel = 'Heart rate')

#regression analysis to draw a line
X = sm.add_constant(power_heart_rate['Power'])
model = sm.OLS(power_heart_rate['heart_rate'] , X).fit()

predict = model.predict(X)

plt.plot(power_heart_rate['Power'] ,predict , color = 'purple' , label = 'Regression line ')
plt.legend(loc="upper left")
plt.show()

Analysis scattor plot-¶

In this analysis, we examined the correlation between power output and heart rate. The scatter plot reveals a noticeable concentration of data points, indicating a relationship between them. As power output increases, heart rate tends to rise as well, suggesting that higher levels of exertion correspond with elevated heart rates, which is consistent with what would be expected in physical activity.

Figure 3 - Effects of Exercise on Heart Rate¶

Line Plot - Line plot are good for plotting the timeseries data to find the pattern, trend over time.

I will be plotting the min/max heart rate for all the exercise to find any pattern or trend in data

In [13]:
# get agg data 
running  = process_and_agg_data(load_and_pre_process_data(FILE_PATH))

### Analysis assumes speed less then 12KM/Hour as running.
running = running[running['avg_speed']<12]


fig = plt.subplots(figsize = (12,8))

#use custom ticks window of 5  
y_ticks = [5 * i for  i in  range(40)]

x_ticks = running['start_time'].dt.date.apply(lambda x : str(x))

#plot min heart rate overtime for activity
sns.lineplot(x = 'start_time' ,  y = 'min_hr' ,  data = running,
             linewidth = 2.5, marker = 'o' , markersize = 10).set(
    title = 'Heart rate over time' , xlabel = 'Exercise date', ylabel = 'Heart rate' ,  yticks = y_ticks , xticks = x_ticks)

#plot max heart rate overtime for activity
sns.lineplot(x = 'start_time' ,  y = 'max_hr' ,  data = running,
             linewidth = 2.5, marker = 'o' , markersize = 10).set(
    title = 'Heart rate over time' , xlabel = 'Exercise date', ylabel = 'Heart rate' ,  yticks = y_ticks, xticks = x_ticks)

# average heart rate for healthy person is between 90-153 for age group 40-45

# Plot line for lower heart rate limit for age round
sns.lineplot(x = 'start_time' ,  y = 90 ,  data = running,
             linewidth = 2.5 , label = 'Lower heart rate for age group 40-45')

## Plot line for upper heart rate limit for age round
sns.lineplot(x = 'start_time' ,  y = 153 ,  data = running,
             linewidth = 2.5 , label = 'Upper heart rate for age group 40-45')

# rotate the ticks for clearn x axis
plt.xticks(rotation=85)
line = plt.gca().lines
plt.fill_between(line[0].get_xdata(),line[0].get_ydata(), line[1].get_ydata(), color='grey', alpha=.5)
plt.legend(loc="upper left")
plt.show()

Result¶

  • The analysis shows that both the lower and upper bounds of heart rate have increased over two month of exercise. Specifically, the minimum heart rate after a month of exercise has risen, indicating improved resting cardiovascular fitness.
  • Simultaneously, the maximum heart rate has also increased, suggesting that now you can do higher levels of intensity without experiencing the same physiological stress as before.

Figure 5 Heart Rate and Distance¶

lineplot - Visualize the distance and heart rate for different activity to find how different activities effects heart rate.

In [6]:
#load data and update activity type
data = data_updated_activity(FILE_PATH)

fig = plt.subplots(figsize = (15,8))
g = sns.lineplot(x = 'distance' ,  y = 'heart_rate' , data = data, hue = 'activity_type', style = 'activity_type',
             linewidth = 2.5, marker = 'o' , markersize = 10).set(
    title = 'Line plot - distance and heart rate for different activity' , xlabel = 'distance covered in meter', ylabel = 'Heart rate')
plt.show()

Visual observation for differnt activity type¶

  • Running - Fluctuaion in heart rate is very high during running.
  • Bicycle - Fluctuaion is low for riding the bicycle and it goes down as distance increases.

Figure 6 - Heart Rate , Power and cadence relationship¶

3d Plot 3D scatter plot allows us to add a third dimension to analysis/visualization, we will visulize three variable Power, heart_rate and cadence to find the relationship.

In [7]:
def plot_3d(data):

    # add a new figure with size 6 6
    fig = plt.figure(figsize = (6,6))
    ax = fig.add_subplot(projection='3d')
    
    #add a 3d scatter plot 
    artists=ax.scatter(data["Power"], data["heart_rate"], data["cadence"],
                   s=5, c=data["cadence"], cmap='Blues')

    #add lable to colorbar
    plt.colorbar(artists).set_label("Power (watts)")

    # add lable to access
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')
    ax.set_zlabel('cadence')
    plt.show()

data = data_updated_activity(FILE_PATH)[['heart_rate' , 'Power' , 'bucket' , 'cadence']].dropna()
for bucket in data['bucket'].unique():
    bucket_data = data[data['bucket'] == bucket]
    plot_3d(bucket_data)

Result - 3d plot¶

The plot indicates that as cadence increases, there is a corresponding increase in both heart rate and power output. This concentration suggests a predictable pattern of response during physical activities, where higher cadence results in elevated heart rates and higher power.

Figure 7 Geographical Information Systems¶

We will use the GIS lib folium visualize route for all the activity on the map. GIS plot provides intrective way to explore the geographical. We will use it explore details about the route taken during exercise. Folium support intrective map which we can use to explore area near the activity to find a optimial route for next activity.

In [8]:
# prepare map attribute 
def prepare_gis_map(data , title):


    #prepare the data to map
    data  = data * (180 / 2**31 )

    start = [data["position_lat"].iloc[0], data["position_long"].iloc[0]]
    end   = [data["position_lat"].iloc[-1], data["position_long"].iloc[-1]]
    
    # start lat/long
    m = folium.Map(location= start , zoom_start=14)
    
    #add title to map
    title_html = f'<h3 align="center" style="font-size:22px;color:blue; font-family:Brush Script MT,cursive"><b>{title} </b></h3>'
    m.get_root().html.add_child(folium.Element(title_html))
    
    # add a start marker
    folium.Marker(start, popup="Start" , icon=folium.Icon("green") ).add_to(m)

    # add a end marker
    folium.Marker(end,   popup="Stop" , icon=folium.Icon("green")).add_to(m)
    
    route = folium.PolyLine(locations=zip(data["position_lat"], data["position_long"]),
                    weight=5, color='blue').add_to(m)
   
    return m 

data = data_updated_activity(FILE_PATH)[['position_lat' , 'position_long' , 'bucket' , 'activity_type' , 'timestamp']].dropna()
for bucket in data['bucket'].unique():
    bucket_data = data[data['bucket'] == bucket]
    title = f'''Activity start time - {bucket_data['timestamp'].iloc[0]} 
              Activity Type - {bucket_data['activity_type'].iloc[0]}'''
    display(prepare_gis_map(bucket_data[['position_lat' , 'position_long' ]] , title))
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook

Conclusions and Recommendations¶

  • Data clearly indicates that heart rate fluctuates significantly during running compare to bicycling.
  • In summary, our analysis confirms that increased physical activity correlates with higher heart rates during exercise, supporting previous knowledge about exercise intensity and heart rate response. Furthermore, the improvement in resting heart rates after one month of exercise underlines the beneficial effects of regular physical activity on cardiovascular health.
  • The data confirms a clear positive correlation between activity intensity (power) and heart rate, with significant implications for personal health monitoring.
  • I would like to recommend personalized heart rate monitoring during workouts. When Engaging in high-intensity training we should be aware of our heart rate ranges to avoid overexertion, We should monitor out heart rate during exercise and if heart rate deviate for from recommended limit we should take a pause and rest. Only start the activity again once heart rate is back to normal.
  • While our analysis supports general trends in heart rates, further research could investigate the impact of demographic factors such as age or fitness levels.
In [ ]:
print